16 research outputs found

    Blending Generative Adversarial Image Synthesis with Rendering for Computer Graphics

    Full text link
    Conventional computer graphics pipelines require detailed 3D models, meshes, textures, and rendering engines to generate 2D images from 3D scenes. These processes are labor-intensive. We introduce Hybrid Neural Computer Graphics (HNCG) as an alternative. The contribution is a novel image formation strategy to reduce the 3D model and texture complexity of computer graphics pipelines. Our main idea is straightforward: Given a 3D scene, render only important objects of interest and use generative adversarial processes for synthesizing the rest of the image. To this end, we propose a novel image formation strategy to form 2D semantic images from 3D scenery consisting of simple object models without textures. These semantic images are then converted into photo-realistic RGB images with a state-of-the-art conditional Generative Adversarial Network (cGAN) based image synthesizer trained on real-world data. Meanwhile, objects of interest are rendered using a physics-based graphics engine. This is necessary as we want to have full control over the appearance of objects of interest. Finally, the partially-rendered and cGAN synthesized images are blended with a blending GAN. We show that the proposed framework outperforms conventional rendering with ablation and comparison studies. Semantic retention and Fr\'echet Inception Distance (FID) measurements were used as the main performance metrics

    Vision Language Models in Autonomous Driving and Intelligent Transportation Systems

    Full text link
    The applications of Vision-Language Models (VLMs) in the fields of Autonomous Driving (AD) and Intelligent Transportation Systems (ITS) have attracted widespread attention due to their outstanding performance and the ability to leverage Large Language Models (LLMs). By integrating language data, the vehicles, and transportation systems are able to deeply understand real-world environments, improving driving safety and efficiency. In this work, we present a comprehensive survey of the advances in language models in this domain, encompassing current models and datasets. Additionally, we explore the potential applications and emerging research directions. Finally, we thoroughly discuss the challenges and research gap. The paper aims to provide researchers with the current work and future trends of VLMs in AD and ITS

    3D Understanding of Deformable Linear Objects: Datasets and Transferability Benchmark

    Full text link
    Deformable linear objects are vastly represented in our everyday lives. It is often challenging even for humans to visually understand them, as the same object can be entangled so that it appears completely different. Examples of deformable linear objects include blood vessels and wiring harnesses, vital to the functioning of their corresponding systems, such as the human body and a vehicle. However, no point cloud datasets exist for studying 3D deformable linear objects. Therefore, we are introducing two point cloud datasets, PointWire and PointVessel. We evaluated state-of-the-art methods on the proposed large-scale 3D deformable linear object benchmarks. Finally, we analyzed the generalization capabilities of these methods by conducting transferability experiments on the PointWire and PointVessel datasets

    Risky Action Recognition in Lane Change Video Clips using Deep Spatiotemporal Networks with Segmentation Mask Transfer

    Full text link
    Advanced driver assistance and automated driving systems rely on risk estimation modules to predict and avoid dangerous situations. Current methods use expensive sensor setups and complex processing pipeline, limiting their availability and robustness. To address these issues, we introduce a novel deep learning based action recognition framework for classifying dangerous lane change behavior in short video clips captured by a monocular camera. We designed a deep spatiotemporal classification network that uses pre-trained state-of-the-art instance segmentation network Mask R-CNN as its spatial feature extractor for this task. The Long-Short Term Memory (LSTM) and shallower final classification layers of the proposed method were trained on a semi-naturalistic lane change dataset with annotated risk labels. A comprehensive comparison of state-of-the-art feature extractors was carried out to find the best network layout and training strategy. The best result, with a 0.937 AUC score, was obtained with the proposed network. Our code and trained models are available open-source.Comment: 8 pages, 3 figures, 1 table. The code is open-sourc
    corecore